44,100 Hz

In digital audio, 44,100 Hz is a common sampling frequency: analog audio is recorded by sampling it 44,100 times per second, and then these samples are used to reconstruct the audio signal when playing it back. "Hz" is an abbreviation for hertz, meaning "[cycles, samples] per second", and the alternative form 44.1 kHz (kilohertz, 1000 times per second) is also very commonly found.

44.1 kHz audio is widely used, due to this being the sampling rate used in Compact Discs, and its common use dates back to its use by Sony from 1979.

Contents

History

The 44.1 kHz sampling rate originated in the late 1970s with PCM adaptors, which recorded digital audio on video cassettes (specifically U-matic cassettes), notably the Sony PCM-1600 (1979) and subsequent model in this series. This then became the standard for Compact Disc audio in the Red Book standard (1980),[1] and its use continued in 1990s standards such as MP3 and the DVD, and in 2000s standards such as HDMI.

Why 44.1 kHz?

The rate was chosen following debate between manufacturers, notably Sony and Philips, and its implementation by Sony, yielding a de facto standard. The technical reasoning behind the rate being chosen is as follows.

Human hearing and signal processing

Firstly, because the high frequency limit of human hearing is about 20 kHz (the hearing range of human ears is roughly 20 Hz to 20,000 Hz), and via the sampling theorem the sampling rate must be twice the maximum frequency one wishes to reproduce, the sampling rate had to be at least 40 kHz. In addition to this, signals must be low-pass filtered before sampling, otherwise aliasing occurs, and while an ideal low-pass filter would perfectly pass frequencies below 20 kHz (without attenuating them) and perfectly cut off frequencies above 20 kHz, in practice a transition band is necessary, where frequencies are partly attenuated. The wider this transition band is, the easier and cheaper it is to make a low-pass filter, which is in favor of a higher sampling rate. The resulting increase in sample rate is then, by the sampling theorem, twice the bandwidth of the transition band – for example, a 2 kHz transition band (passing 20 kHz almost completely, cutting 22 kHz almost completely) requires 44 kHz sampling.

Recording on video equipment

Early digital audio was recorded to existing analog video cassette tapes, as these were the only available media with sufficient capacity to store meaningful lengths of audio; formally, the video cassette is the transport, and this format has been termed pseudo-video.[2] To enable reuse with minimal modification of the video equipment, these ran at the same speed as video, and used much of the same circuitry. Specifically, audio samples were recorded as if they were on the lines of a raster scan of video, as follows: analog video standards represent video at a field rate of 60 Hz (NTSC, North America – or 60/1.001 Hz ≈ 59.94 Hz for color NTSC) or 50 Hz (PAL, Europe), which corresponds to a frame rate of 30 frames per second (frame/s) or 25 frame/s – each field is half the lines of an interlaced image (alternating the odd lines and the even lines). Each of these fields is in turn composed of lines (see raster scan) – a frame of 625 lines for PAL and 525 lines for NTSC, though some of the "lines" are actually for synchronizing the signal (see vertical blanking interval), and a field comprises half the visible lines in one vertical scan. Digital audio samples were then encoded along each line, thus allowing reuse of the existing synchronization circuitry – as video, the resulting images look like lines of binary black and white (rather, gray) dots along each scan line. The line frequency (lines per second) was 15,625 Hz for PAL (625 × 50/2), 15,750 Hz for 60 Hz (monochrome) NTSC (525 × 60/2), and 15,750/1.001 Hz (approximately 15,734.26 Hz) for 59.94 (color) NTSC, and thus to record audio at the required over 40 kHz required encoding multiple samples per line, with 3 samples per line being sufficient, yielding up to 15,625 × 3 = 46,875 for PAL and 15,750 × 3 = 47,250 for NTSC. One wished to minimize the number of samples per line, so that each sample could have more space devoted to it, thus making it easier to have a higher bit depth (16 bits, rather than 14 or 12 bits, say) and better error tolerance, and in practice the signal was stereo, requiring 3 × 2 = 6 samples per line. However, some of these lines were devoted to (vertical) synchronization: specifically, the lines during the vertical blanking interval (VBI) could not be used, so a maximum of 490 lines per frame (245 lines per field) could be used in NTSC, and about 588 lines per frame (294 lines per field) on PAL. (Note that in video PAL has (up to) 575 visible lines[3] while NTSC has up to 485.[4])

NTSC and PAL compatibility

It is simplest if the same number of lines are used in each field, and, crucially, it was decided that a sample rate that could be used on both NTSC (monochrome) and PAL equipment. Since NTSC has a field rate of 60 Hz, and PAL has a field rate of 50 Hz, their least common multiple is 300 Hz, and with 3 samples per line, this yields a sample rate that is a multiple of 900 Hz. For NTSC the sample rate is 5m × 60 × 3, where 5m is the number of active lines per field, which must be a multiple of 5 (the rest used for synchronization), and for PAL the sample rate is 6n × 50 × 3, where 6n is the number of active lines per field, which must be a multiple of 6.

The sampling rates that satisfy these requirements – at least 40 kHz (so can encode 20 kHz sounds), no more than 46.875 kHz (so require no more than 3 samples per line in PAL), and a multiple of 900 Hz (so can be encoded in NTSC and PAL) are thus 40.5, 41.4, 42.3, 43.2, 44.1, 45, 45.9, and 46.8 kHz. The lower ones are eliminated due to low-pass filters requiring a transition band, while the higher ones are eliminated due to some lines being required for vertical blanking interval; 44.1 kHz was the higher usable rate, and was eventually chosen.

Conclusion

The actual choice of rate was the point of some debate, with other alternatives including 44,100/1.001 = 44.056 kHz (corresponding to the NTSC color field rate of 60/1.001 = 59.94 Hz) or approximately 44 kHz, proposed by Philips. Ultimately Sony prevailed on both sample rate (44.1 kHz) and bit depth (16 bits per sample, rather than 14 bits per sample).

The sample rate is composed as follows:

NTSC:

245 × 60 × 3 = 44,100
245 active lines/field × 60 fields/second × 3 samples/line = 44,100 samples/second
(490 active lines per frame, out of 525 lines total)

PAL:

294 × 50 × 3 = 44,100
294 active lines/field × 50 fields/second × 3 samples/line = 44,100 samples/second
(588 active lines per frame, out of 625 lines total)

In actual practice, different machines used different video cassettes – for example, the Sony PCM-1610 only used 525/60 monochrome video (NTSC, US), not 625/50 (PAL, Europe) or NTSC color.

Alternative rates

Several other sampling rates were also used in early digital audio, most significantly 48 kHz, discussed below in status.

Earlier rates included a 50 kHz sample rate, used by Soundstream (by Thomas Stockham) in the 1970s, following a 37 kHz prototype.

In the early 1980s, a 32 kHz sampling rate was used in broadcast (esp. in UK and Japan), because this was sufficient for FM stereo broadcasts, which had 15 kHz bandwidth.

Some digital audio was provided for domestic use in 2 incompatible EIAJ formats, with 2 incompatible, corresponding to 525/59.94 (44,056 Hz sampling) and 625/50 (44.1 kHz sampling).

Lastly, in what appears to be a coincidence, the 44.1 kHz sampling rate is exactly 4 times the line frequency of the old 441 lines German TV standard, which had a frequency of 441 × 50 ÷ 2 = 11,025 Hz (441 lines per frame, 50 fields per second, 2 fields per frame).

See sampling rate: audio for further rates.

Related rates

Various multiples of 44.1 kHz are used – the lower rates 11.025 kHz and 22.05 kHz are found in WAV files, and are suitable for low-bandwidth applications, while the higher rates of 88.2 kHz and 176.4 kHz are used in mastering and in DVD-Audio – the higher rates are useful both for the usual reason of providing additional resolution (hence less sensitive to distortions introduced by editing), and also making the low-pass filtering easier, since a much larger transition band (between human-audible at 20 kHz and the sampling rate) is possible. The 88.2 kHz and 176.4 kHz rates are primarily used when the ultimate target is a CD.

Consequences

Subsequently, the DAT format was released in 1987, with 48 kHz sampling, and this sample rate, which is a rounder number and also allows a larger transition band in low-pass filtering, has also become common. Converting between these sample rates – sample rate conversion – was initially difficult, due to the relatively high numbers in the ratio between these rates: 44,100:48,000 = 147:160, but is today easy. This difference was initially exploited to make it difficult to copy 44.1 kHz CDs using 48 kHz DAT equipment.

Status

Due to the popularity of CDs, a great deal of 44.1 kHz equipment exists, as does a great deal of audio recorded in 44.1 kHz (or multiples thereof). However, some more recent standards use 48 kHz in addition to or instead of 44.1 kHz. In video, 48 kHz is now the standard, but for audio targeted at CDs, 44.1 kHz (and multiples) are still used.

The HDMI TV standard (2003) allows both 44.1 kHz and 48 kHz (and multiples), which provides compatibility with DVD players playing CD, VCD and SVCD content, while the DVD and Blu-ray Disc standards use 48 kHz only.

Most audio processors/sound cards contain DAC for both 44.1 kHz and 48 kHz, being able to natively output either, though some older processors include only 44.1 kHz output, and some cheaper newer processors only include 48 kHz output, requiring digital sample rate conversion to output other sample rates. Similarly, processors may be able to record natively at only certain sample rates.

See also

References

  1. ^ See Watkinson for detailed discussion of the history and diagrams.
  2. ^ Wilkinson
  3. ^ ITU-R BT.470-6
  4. ^ SMPTE 170M
  • The Art of Digital Audio, John Watkinson, 2nd edition
    • Watkinson, section 1.14: "The PCM adaptor", pp. 22–24
    • Watkinson, section 4.5: "Choice of sampling rate", pp. 207–209
    • Watkinson, section 9.2: "PCM adaptors", pp. 499–502
  • 2-35] Why 44.1KHz? Why not 48KHz?, CD-Recordable FAQ, by Andy McFadden et. al.